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Relative  Size  of  Certain  Polynomial  Time  Solvable 
Subclasses  of  Satisfiability 

J.  Franco 


Abstract.  We  determine,  according  to  a  certain  measure,  the  relative  sizes 
of  several  well-known  polynomially  solvable  subclasses  of  SAT.  The  measure 
we  adopt  is  the  probability  that  randomly  selected  fc-SAT  formulas  belong  to 
the  subclass  of  formulas  in  question.  This  probability  is  a  function  of  the  ratio 
r  of  clauses  to  variables  and  we  determine  those  ranges  of  this  ratio  that  result 
in  membership  with  high  probability. 

We  show,  for  any  fixed  r  >  4/(k(k  —  1)),  the  probability  that  a  random 
formula  is  SLUR,  q-Horn,  extended  Horn,  CC-balanced,  or  renamable  Horn 
tends  to  0  as  n  — oo.  We  also  show  that  most  random  unsatisfiable  formulas 
are  not  members  of  one  of  these  subclaisses. 


1.  Introduction 

The  Satisfiability  problem  (SAT)  is  to  determine  whether  there  exists  a  satisfy¬ 
ing  truth  assignment  for  a  given  Boolean  expression.  This  problem  is  NP-complete, 
thus  there  is  no  known  polynomial-time  algorithm  for  solving  it.  Because  of  the 
importance  of  SAT  in  logic,  artificial  intelligence,  and  operations  research,  consider¬ 
able  effort  has  been  spent  to  determine  how  to  cope  with  this  disappointing  reality. 
Two  approaches  are:  1)  determine  whether  there  exist  algorithms  for  SAT  which 
usually  present  a  result  in  polynomial  time;  2)  identify  special  subclasses  of  SAT 
that  can  be  solved  in  polynomial  time.  This  paper  is  concerned  with  the  second 
approach. 

In  this  paper  we  determine,  according  to  a  certain  measure,  the  relative  sizes  of 
several  well-known  polynomially  solvable  subclasses  of  SAT.  The  measure  we  adopt 
is  the  probability  that  randomly  selected  formulas,  drawn  from  a  family  of  proba¬ 
bility  spaces,  belongs  to  the  subclass  of  formulas  in  question.  More  specifically,  the 
measure  is  the  parameter  value  on  the  probability  space  for  which  the  probability 
of  membership  tends  to  0  in  the  limit. 

Some  notable  polynomial  time  solvable  subclasses  of  SAT  (see  Section  2  for 
definitions)  are: 

1.  Horn  [13,  20,  24], 

2.  extended  Horn  [6], 
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3.  CC-balanced  [11], 

4.  SLUR  (Single  Lookahead  Unit  Resolution)  solvable  [23], 

5.  q-Horn  [3,  4]. 

Below,  we  refer  to  these  as  the  well-known  polynomial  time  solvable  subclasses. 

We  will  not  be  concerned  with  various  hierarchical  subclasses  of  SAT  [12,  15, 
17,  18,  21],  even  though  portions  of  them  are  also  solvable  in  polynomial  time. 
Except  for  the  pure  implicational  hierarchy  [15],  the  best  known  complexities  of 
these  classes  is  O(n^)  where  k  reflects  the  level  of  a  hierarchy;  therefore,  it  is  likely 
that  such  hierarchies  are  not  efflciently  solved  for  any  but  the  first  few  levels.  Low 
expressibility  is  the  main  factor  in  ignoring  the  pure  implicational  hierarchy.  Also 
2-CNF  is  polynomial  time  solvable  [1,  14],  but  we  shall  not  be  concerned  with 
random  2-CNF  formulas  in  this  paper. 

We  are  interested  in  the  relative  sizes  of  the  subclasses  above  primarily  because 
of  the  results  of  Boros  et  aL  [4],  which  suggest  that  the  class  of  q-Horn  formulas  is 
close  to  what  might  be  regarded  as  the  largest  easily  expressible  subclass  of  SAT 
that  can  be  solved  by  a  polynomial  time,  uniform  algorithm.  They  formulate  a 
set  of  linear  constraints,  based  on  the  input  formula  (with  n  variables)  and  a  real 
parameter  Z,  and  show  that,  for  any  fixed  c  >  0,  the  class  of  formulas  that  satisfies 
the  constraints  with  Z  =  1  -h  can  be  solved  in  polynomial  time.  In  addition, 
the  class  that  satisfies  the  constraints  with  Z  =  1  is  precisely  the  q-Horn  class  (see 
Definition  2.7).  On  the  other  hand,  for  any  /^  <  1,  the  class  of  formulas  that  satisfy 
these  constraints  with  Z  =  1  -h  is  NP-complete. 

To  measure  size  we  use  a  well-known  probability  distribution 
known  as  the  constant  clause-width  model,  defined  over  the  sample  space  of  k- 
CNF  formulas,  which  is  defined  over  a  set  of  n  propositional  variables.  The  clause 
space  for  consists  of  the  2^(^)  clauses  with  k  literals  such  that  no  two  lit¬ 

erals  are  based  on  the  same  variable.  The  formula  space  consists  of  all  multisets  of 
m  clauses.  Each  multiset  has  equal  probability,  which  is  (2^(^))  Thus  clauses 
are  generated  by  sampling  without  replacement,  while  formulas  are  generated  by 
sampling  with  replacement.  This  paper  restricts  attention  to  fc  >  3.  Probability 
spaces  will  frequently  be  grouped  according  to  ratio  r  =  mfn. 

We  determine  those  regions  of  the  parameter  space  (m,  rz,  k)  over  which  a  ran¬ 
dom  formula  has  low  probability  of  being  in  a  certain  subclass,  such  as  q-Horn,  etc. 
We  use  this  approach  because 

1.  several  of  the  subclasses  considered  are  incomparable; 

2.  the  ratio  r  =  m/n  provides  a  scale  which  has  been  shown,  both  theoretically 
and  experimentally,  to  measure  the  hardness  of  formulas; 

3.  many  results  already  proven  for  the  formula  distribution  may  be  used  to  add 
dimension  to  the  results  presented  here. 

Except  for  point  8,  the  following  results  for  random  formulas  under  ^  ^ 

are  known  [5,  7,  8,  9,  10,  16,  19]. 

1.  For  any  fixed  r  >  .65  2^,  the  probability  that  a  random  formula  is  unsatis- 
fiable  tends  to  1  as  n  oo. 

2.  For  any  fixed  r  >  ,65  2^,  there  is  no  known  algorithm  that  will  verify  unsat¬ 
isfiability  of  a  random  formula  in  polynomial  time  with  probability  tending 
to  1  cLS  n  — >  oo. 

3.  For  any  fixed  r  <  .25  2^/Ar,  the  probability  that  a  random  formula  is  satis- 
fiable  tends  to  1  as  n  — >  oo. 


RELATIVE  SIZE  OF  CERTAIN  POLYNOMIAL  TIME  SOLVABLESUBCLASSES  OF  SATISFIABILITY 


4.  For  any  fixed  r  <  .25  2^//?,  with  probability  tending  to  1  as  n  — )■  oc,  a 
random  formula  that  is  satisfiable  can  be  solved  in  polynomial  time  by  an 
iterative  variable  elimination  algorithm  that  relies  primarily  on  choosing 
variables  for  elimination  from  a  shortest  clause. 

5.  For  any  fixed  r  <  1.63,  a  random  3-CNF  formula  can  be  satisfied  by  repeated 
application  of  the  pure  literal  rule,  with  probability  tending  to  1  as  n  oo. 

6.  For  any  fixed  r  <  1,  a  random  formula  can  be  satisfied  by  applying  any 
algorithm  for  2-SAT  to  the  formula  with  all  but  2  literals  randomly  removed 
from  each  clause,  with  probability  tending  to  1  as  n  ->■  oo. 

7.  The  average  number  of  occurrences  of  a  variable  in  a  random  formula  is  less 
than  1  if  r  <  1/Ar. 

8.  The  average  number  of  cycles  in  a  random  formula  is  bounded  from  above 
by  a  small  constant  if  r  <  1/Ar^.  A  cycle  in  this  context  means  an  undi¬ 
rected  cycle  in  the  graph  formed  by  considering  each  clause  as  a  node  and 
connecting  each  pair  of  clauses  that  share  a  variable.  This  result  is  proved 
in  Section  3.3. 

The  first  two  points  above  show  where  random  formulas  are  “hard”  and  the 
last  six  points  show  where  random  formulas  are  “easy.”  Notice  the  progression 
from  very  hard  formulas  (not  easily  solved  by  resolution)  at  r  =  .65  2^,  r  fixed, 
to  usually  solvable  in  polynomial  time  at  r  =  .25  2^/k  by  non-trivial  heuristics  to 
easily  solvable  by  a  2-SAT  algorithm  at  r  =  1  to  very  easily  solvable  since  variables 
usually  occur  one  time  in  a  formula  at  r  =  1//:  to  trivially  solvable  due  to  no  or 
few  cycles  at  r  =  1/Ar^.  Thus,  thought  of  as  a  generator  of  formulas 

of  hardness  controlled  by  the  ratio  r.  We  wish  to  see  where  the  well-known  classes 
fall  on  this  scale. 

In  this  paper  we  present  the  following  result.  Definitions  appear  in  Section  2. 

•  For  any  fixed  r  >  4/{k{k  —  1)),  the  probability  that  a  random  formula  is 
SLUR,  q-Horn,  extended  Horn,  CC-balanced,  or  renamable  Horn  tends  to  0 
as  n  — oo. 

Therefore,  the  well-known  polynomial  time  solvable  subclasses  of  SAT,  by  our  mea¬ 
sure,  do  not  represent  most  “easy”  formulas  for  a  wide  range  of  values  of  r  and  are 
much  smaller  than  other  classes  of  formulcts  that,  as  an  aggregate,  are  easily  solved 
with  high  probability.  It  is  interesting  to  note  that  the  probability  that  a  random 
formula  T  is  in  one  of  the  well-known  subclasses  tends  to  0  as  n  ^  oo,  unless  the 
average  number  of  occurrences  of  a  variable  in  !F  is  less  than  4/(/:  —  1),  a  very  small 
number.  In  addition,  our  results  show  that  most  random  unsatisfiable  formulas  are 
not  members  of  one  of  the  well-known  subclasses. 

2*  Polynomial  Time  Solvable  Subclasses  of  SAT 

We  specify  SAT  for  the  purposes  of  this  paper  as  follows.  Let  V  =  {vi, ...,  Vn) 
be  a  set  of  n  Boolean  variables.  Let  Ln  =  iTn}  be  a  set  of  n  positive 

and  n  negative  literals  over  variables  in  V,  A  truth  assignment  to  the  literals  L 
is  a  mapping  t  :  L  such  that  =  T  if  and  only  if  t{v)  ^  T.  k 

subset  of  literals  L  is  called  a  clause.  A  clause  C  has  truth  value  T  under  a  truth 
assignment  t  if  and  only  some  literal  in  C  is  assigned  T.  A  collection  (multiset)  of 
clauses,  {Ci, (^2, . . Cm},  is  a  formula  in  Conjunctive  Normal  Form  (CNF).  From 
now  on,  it  is  understood  that  formula  means  formula  in  Conjunctive  Normal  Form. 
A  formula  T  is  satisfiable  if  and  only  if  there  exists  a  truth  assignment  i  such 
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that  every  clause  in  T  has  truth  value  T  under  t.  Such  a  ^  is  said  to  satisfy  !F. 
The  objective  of  an  algorithm  for  SAT  is  to  determine  whether  a  given  formula  is 
satisfiable. 

It  will  be  useful  to  represent  a  formula  ^  as  an  m  x  n  (0,  ±l)-matrix. 

Definition  2.1.  Given  a  formula  its  clause-variable  matrix,  denoted  as 
M:f,  is  the  m  x  n  matrix  in  which  element  {i,j)  has  the  value  +1  if  clause  Ci  has 
literal  vj,  has  the  value  -1  if  clause  Ci  has  literal  vj,  and  has  the  value  0  otherwise. 

The  remainder  of  this  section  defines  certain  subclasses  of  SAT  that  are  solved 
in  polynomial  time. 

Definition  2.2.  A  formula  is  Horn  if  and  only  if  every  row  of  has  at 
most  one  +1  value. 

Horn  formulas  can  be  solved  in  linear  time  by  unit  resolution  [13,  20,  24]. 

Definition  2.3.  (Lewis  [22])  A  formula  T  is  renamable  Horn  if  and  only 
if  multiplying  each  of  some  subset  of  columns  of  Afjr  by  —1  yields  an  M  matrix 
corresponding  to  a  Horn  formula. 

Renamable  Horn  formulas  can  also  be  solved  in  linear  time  [2] . 

Extended  Horn  formulas  can  be  expressed  as  linear  inequalities  for  which  0-1 
solutions  can  always  be  found  (if  one  exists)  by  rounding  a  real  solution  obtained 
using  an  LP  relaxation  [6].  We  find  an  alternative  characterization  is  easier  to 
understand. 

Definition  2.4.  Given  a  formula  let  i?  be  a  rooted  directed  tree  in  which 
each  edge  is  labeled  with  a  different  variable  from  the  set  V. 

A  clause  C  is  an  extended  Horn  clause  w.r,t,  R  if  the  positive  literals  of  C 
correspond  to  a  (possibly  empty)  directed  path  P  in  R,  and  the  set  of  negative 
literals  in  C  correspond  to  a  set  of  directed  paths  Ni,  N2<,  Nt  of  R,  and  exactly 
one  of  the  following  conditions  holds: 

1.  Ni,  N2, Nt  start  at  the  root  s. 

2.  Ni,N2,  Nt^u  (say),  start  at  the  root  s,  and  Nt  starts  at  a  vertex  q  ^  s. 

Moreover,  if  P  is  not  empty,  it  also  starts  at  q, 

A  formula  is  an  extended  Horn  formula  w,r.t.  R  if  each  of  its  clauses  is  an  extended 
Horn  clause  w.r.t.  R,  A  formula  is  an  extended  Horn  formula  if  it  is  an  extended 
Horn  formula  w.r.t.  some  such  rooted  directed  tree  R, 

One  tree  R  for  a  given  Horn  formula  is  a  star  (one  root  and  all  leaves  with  an 
edge  for  each  variable  in  the  formula).  Hence,  the  class  of  extended  Horn  formulas 
is  a  generalization  of  the  class  of  Horn  formulas. 

Unsatisfiable  extended  Horn  formulas  can  be  recognized  in  polynomial  time, 
by  an  algorithm  based  on  unit-resolution  plus  rounding  [6].  Therefore,  if  a  formula 
is  known  to  be  extended  Horn  a  priori,  it  can  be  solved  in  polynomial  time.  How¬ 
ever,  there  is  no  known  polynomial  time  algorithm  for  recognizing  extended  Horn 
formulas. 

The  following  class  is  also  rooted  in  Linear  Programming. 

Definition  2.5.  A  formula  T  is  CC-balanced  if  in  every  submatrix  of  with 
exactly  two  nonzero  entries  per  row  and  per  column,  the  sum  of  the  entries  is  a 
multiple  of  four  [25]. 
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The  motivation  for  studying  CC-balanced  formulas  is  the  question,  for  SAT, 
when  do  Linear  Programming  relaxations  have  integer  solutions?  CC-balanced 
formulas  can  be  recognized  and  solved  in  polynomial  time  [11], 

The  class  SLUR,  for  Single  Lookahead  Unit  Resolution,  is  peculiar  in  that 
it  is  defined  by  an  algorithm,  and  not  by  structural  properties  of  formulas.  The 
algorithm  defining  SLUR  is  given  below.  In  it,  the  function  unitprop{T)  returns  the 
result  of  performing  the  well-known  unit  clause  simplification  until  no  unit  clauses 
remain  in  the  formula.  It  also  returns  the  set  of  unit  clauses  found  and  derived.  It 
is  known  that  unitprop  can  be  implemented  in  time  linear  in  \T\  [12]. 

Algorithm  SLUR(j^) 

Input:  A  CNF  formula  !F  with  no  empty  clause 

Output:  A  satisfying  partial  truth  assignment  for  the  variables  in 

or  ^^unsatisfiable” ,  or  “give  up” 

Initialize  T  :=  unitpropiT). 

Initialize  t  the  set  of  unit  clauses  returned  by  unitprop. 

If  0  E  then 

Output  “unsatisfiable”  and  halt. 

While  T  is  not  empty  do  the  following: 

Select  a  variable  v  appearing  as  a  literal  of  iF. 

Set  J'co  :=  unitprop (T  U  {^}). 

Set  ti  :=  unit  clauses  returned  by  unitprop. 

Set  J’g  :=  unitprop {T  U  {□})• 

Set  t2  :=  unit  clauses  returned  by  unitprop. 

If  0  E  Too  and  &  e  then 

Output  “give  up”  and  halt. 

Otherwise,  if  0  ^  Too  ?  then 
Set  T  :=  Too' 

Set  t  :=tuti. 

Otherwise, 

SetT:=Te- 
Set  t  :=  t\Jt2- 
(Continue  the  loop.) 

Output  t. 

End  Algorithm  SLUR 

Definition  2.6.  A  formula  is  in  the  class  SLUR  if,  for  all  possible  sequences 
of  selected  variables,  algorithm  SLUR  does  not  give  up. 

Algorithm  SLUR  takes  linear  time  with  the  modification,  due  to  Truemper  [25], 
that  unit  resolution  (in  unitprop)  be  applied  simultaneously  to  both  branches  of  a 
selected  variable,  abandoning  one  branch  if  the  other  finishes  first  without  falsifying 
a  clause.  Note  that  due  to  the  definition  of  this  class,  the  question  of  class  recogni¬ 
tion  is  avoided.  The  class  SLUR  was  developed  as  a  generalization  of  other  classes 
including  Horn,  renamable  Horn,  extended  Horn,  and  CC-balanced  formulas  [23]. 
The  class  q-Horn  was  developed  in  [3,  4]. 

Definition  2.7.  Let  be  a  set  of  Boolean  variables.  For  clause 

Ci,  let  Pi  be  the  set  of  indices  of  its  positive  literals  and  let  Nj  be  the  set  of  indices 
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of  its  negative  literals.  Construct  the  following  system  of  inequalities: 


+  ^  (1  -  Olj) 

<  Z,  (z  =  1, 2, ...,  m),  and 

(1) 

jeP. 

j€Ni 

VI 

■«-» 

VI 

0 

(j  = 

(2) 

where  Z  €  If  all  these  constraints  can  be  satisfied  with  Z  =  1,  then  the  formula 
is  q-Horn. 

We  may  also  characterize  this  class  as  a  special  case  of  monotone  decomposition 
of  matrices  [25].  Given  formula  T,  the  monotone  decomposition  of  consists 
of  multiplying  some  columns  by  -1  and  moving  rows  and  columns  to  form  the 
following  partition  into  submatrices: 


where  the  submatrix  has  at  most  one  +1  entry  per  row,  the  submatrix  D  con¬ 
tains  only  —1  or  0  entries,  the  submatrix  E  has  only  0  entries,  and  the  submatrix 

has  no  restrictions.  Below,  we  will  be  concerned  with  the  maximum  mono¬ 
tone  decomposition  where  matrix  A^  is  the  largest  possible.  Maximum  monotone 
decompositions  are  essentially  unique  [25]. 

Definition  2.8.  If  the  maximum  monotone  decomposition  of  Mjf  is  such  that 
A'^  has  no  more  than  two  nonzero  entries  per  row,  then  T  is  q-Horn, 

Recognition  of  q-Horn  formulas  is  made  easy  by  the  fact  that  monotone  decom¬ 
position  can  be  carried  out  in  linear  (0(m  -h  n))  time  [25].  Once  a  q-Horn  formula 
E  is  in  its  decomposed  form  it  can  be  solved  in  linear  time  as  follows.  Treat  subma¬ 
trix  A^  as  a  Horn  formula  and  solve  it  in  linear  time  using  a  method  that  returns 
a  minimum,  unique  truth  assignment  for  the  formula  with  respect  to  true  [13,  20]. 

If  the  Horn  formula  is  unsatisfiable,  then  T  is  unsatisfiable.  Otherwise,  remove  all 
rows  satisfied  by  the  unique  minimum  truth  assignment.  Solve  what  is  left  of  sub¬ 
matrix  by  a  2-SAT  algorithm  [1,  14].  If  a  satisfying  assignment  is  found,  it  may 
be  combined  with  the  unique  minimum  assignment  above  to  give  an  assignment 
satisfying  T,  Otherwise,  T  is  not  satisfiable. 

In  the  analysis  below  we  will  not  directly  consider  some  of  the  classes  defined 
above  because  they  are  subclasses  of  either  SLUR  or  q-Horn.  However,  the  classes 
of  SLUR,  and  q-Horn  formulae  are  incomparable  as  the  following  examples  show. 

Example  2.9.  Any  formula  V2j  ^2?  I’s}  •  •  •  is  not  q-Horn.  To  see 

this,  construct  inequalities  as  in  (1)  and  (2)  for  the  first  two  clauses.  These  force 
a  I  >2  —  ZI2  which  requires  Z  >2, 

Example  2.10.  The  formula  {^i, '^2,  is  not  q-Horn  but  it  is  ob¬ 

viously  SLUR.  This  formula  can  easily  be  extended  to  less  trivial  SLUR  formulas 
that  are  not  q-Horn. 

Example  2.11.  The  formula  {vi,  V2,  ^2, ^2,  vq]  . . .,  where 

...  is  Horn  and  does  not  contain  vi  or  V2  is  q-Horn  but  not  SLUR. 

3.  Analysis 

We  restate  the  definition  of  model  in  terms  of  the  notation  of  Section  2. 

Let  C\,||  be  the  set  of  all  subsets  of  Ln  of  size  k  such  that  no  element  of  C\^||  con¬ 
tains  duplicate  or  complementary  literals.  Random  formulas  generated  according  to 
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contain  m  clauses  selected  uniformly,  independently,  and  with  replacement 
from  We  will  be  interested  in  the  case  k>3  since  random  formulas  generated 
from  ^  ^  solved  in  linear  time  by  existing  2-SAT  algorithms  [1,  14], 

3.1.  SLUR  Analysis. 

Definition  3.1.  For  any  even  x  >  4,  call  a  set  of  x  clauses  an  equivalence 
cycle  if  all  but  two  literals  can  be  removed  from  each  clause,  the  variables  can  be 
relabeled,  and  the  clauses  can  be  reordered  in  the  following  sequence 

{Vi,  V2}{V2,  n}.,.  {v^,Vi]{vi,  Vf  +  l}  . . .  vi], 

where  Vj  ^  Vj  if  i  ^  j.  Given  an  equivalence  cycle  C  C  if  every  clause  C  ^  C 
contains  at  most  k  —  2  literals  that  are  the  same  as  or  complementary  to  the 
removed  literals  of  C,  and  no  two  of  the  literals  removed  from  C  are  the  same  or 
complementary,  then  C  is  called  a  blocked  equivalence  cycle.  The  variable  is 
called  the  end  variable  of  the  cycle. 

Lemma  3.2.  If  a  formula  T  has  a  blocked  equivalence  cycle,  then  T  is  not 
SLUR, 


Proof:  In  algorithm  SLUR,  choose  for  elimination  the  variables  removed  from 
the  blocked  equivalence  cycle  of  T]  proceed  down  the  search  tree  in  the  direction 
corresponding  to  falsifying  the  literals  in  the  blocked  equivalence  cycle.  By  hy¬ 
pothesis,  there  will  be  no  unit  clauses  or  empty  clauses  and  yet  what’s  left  will  be 
unsatisfiable  due  to  the  equivalence  cycle.  This  violates  the  definition  of  SLUR.  □ 

Theorem  3.3.  Under  the  probability  that  a  random  formula  T  is  in 

the  class  SLUR  tends  to  0  if  r  >  Acl[k'^  —  Ar),  c  >  1  a  constant. 


Proof:  We  apply  the  second  moment  method  to  prove  the  theorem.  Let  Bi 
denote  the  number  of  blocked  equivalence  cycles  of  size  i  in  a  random  formula  T. 
Let  X  =  [In^n]  or  x  =  [In^n]  +  1,  whichever  is  even.  We  find  E{Bx)^  the  expected 
number  of  blocked  equivalence  cycles  of  size  x,  and  E{Bl),  the  second  moment  of 
Bx^  We  show  E{Bx)  =  for  some  a  >  1,  when  r  >  4/(P  —  k).  Then  we  show 
E(Bl)  =  E{Bxy{l  +  o(l))  under  the  same  conditions.  Therefore,  by  Chebyshev’s 
inequality, 

Pr{B,  =  0)  <  Pri\B,  -  EiB,)\  >  E(B,))  <  =  o(l) 


when  r  >  4/(P  —  k). 

First,  we  find  E{Bx)>  Pick  x  >  4  clauses,  and  x  -  1  variables.  Arrange  the 
clauses  with  variables  so  as  to  construct  an  equivalence  cycle  where  the  end  variable 
of  the  first  clause  is  repeated  in  the  a;/2th  and  x/2  +  1th  clauses  of  the  cycle.  The 
literal  pattern  of  the  two  literals  of  each  clause  that  cause  it  to  be  in  the  equivalence 
cycle  is  fixed.  The  probability  that  the  clauses  in  the  sequence  match  their  patterns 
is 


n 


iT  — (^  — 2)i^  ' 


2*0 


A(t-i)y 

\  4n^  / 


where  is  used  to  denote  the  product  a{a  —  l){a  —  2)...{a  —  b+l).  The  probability 
that  any  non-cycle  clause  does  not  have  more  than  k  —  2  literals  taken  from  the 
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set  of  X -I  chosen  variables  and  their  complements  is,  ignoring  insignificant  terms, 
1  —  k{x/n)^~^.  Hence,  the  probability  that  all  cycle  clauses  match  their  patterns 
and  non-cycle  clauses  do  not  share  more  than  k  —  2  literals  with  the  cycle  clauses  is 


The  number  of  ways  to  select  x  clauses  is  m(m-l)(m-“2) . . .  (m  — x-hl)  —  *  The 

number  of  ways  to  choose  x  —  1  variables  is  n(n  —  l)(n  —  2) . . .  (n  —  x  -{-  2)  = 
Therefore,  ignoring  insignificant  terms  for  convenience  of  presentation. 

Since  rtf  >  m*(l  -  x/m),  nF^  >  -  {x  -  l)/n),  and  (n  -  > 


>  - 
n 


4n 

k{k  —  l)m^ 
An 


1  1  ^ 

If  m/n  >  Acl{k'^  —  Ar),  where  c  is  any  constant  greater  than  1,  E{Bx)  >  > 

in  the  limit,  for  some  a  >  1, 

Next,  we  find  E[Bl).  Order  all  possible  patterns  of  variable  choices  and  clause 
choices.  There  are  of  these.  Let  +  ^2  +  -^3  +  .  •  •  where  each  Xi 

is  1  if,  for  the  fth  pattern,  there  is  a  blocked  equivalence  cycle  and  is  0  otherwise. 

Then^(^^)  =  Eij^(^i^i)‘ 

Suppose  patterns  i  and  j  have  q  clauses  in  common.  If  g  =  0  it  is  possible  for 
both  patterns  to  co-exist  in  E.  But,  if  ^  >  0,  they  may  not  be  able  to  co-exist:  in 
particular,  the  variable  assignments  at  the  clauses  shared  by  both  patterns  must 
agree.  The  number  of  different  possible  variable  patterns  supporting  consistent 
overlapping  cycles  is  no  greater  than  except  for  q  =  x  in  which  case  it  is 


^x  —  l 


.  The  probability  that  patterns  i  and  j  have  q  clauses  in  commom  is 


(m  —  xy  "^x^ 


Given  two  consistent  blocked  equivalence  cycles,  the  probability  that  both  are  in 
E  is  (ignoring  insignificant  terms  as  above)  no  greater  than 

2a:— g— 1 


n 


=  ( 

Therefore, 

Y,E{XiXj) 


4n^ 


fc-i\ 
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(i  -  ^  (^)''')  ' 

1  ^x\{m  —  xY~'^x^  (  4n*  '\  ^  (n  —  A 

^^^{m-2x)^n-2x)^\q)  m®  yA:(A:-l)y  (n  -  2)(fe-2)2®  \ 


k-l\^ 


{m  —  2x)^(n  —  2xY  yk(k  -  1) y  (n  ~ 


^a;  X  r— 1  x-l  f  k[k  1)  \ 

1  m  n  n  I  - = —  I 

\  4n^  / 


—  \  \Tly  J 

(„  _  (|i  _  k  {lY'J 


\  /  Axrr 

^  ~  “  2A:a;)*-i )  ""  \k(k  -  l)(m  -  3a;)2(n  -  2kx)i^- 


<EiB.)^{l-k(D 


X  1  + 


k{k  —  l)(77z  —  3a:)2(n  —  2kxY 


k{k  —  l)(m  —  3x)^(n  —  2kxY 


<  E{B.r  (i  -  ^  (1) 

/  12x'^n^  / 

^  \  k{k  —  l)(m  —  3xY{n  —  2kxY~^  ^  \k{k  —  l)(m  —  3ic)^(n  —  2kxY'^^ 

=:^(5.)2(l  +  o(l))(l  +  o(l/n)) 

since  —  1)(?72  —  ZxY{n  —  2kxY^^)  —¥  x^n/{k{k  —  l)m?)  —  0{x^/n)  and 

n{4xn^  /  {k{k  —  l)(m  —  3xY{n  —  2kxY~^)Y  — )•  n(4a:n/(^(fc  —  l)m^)Y  —  o(l/n^’"^) 
due  to  m  ~  3aj  ^  m,  n  —  2kx  — >•  n,  and  m/n  =  r  >  4/(fc^  —  A;).  □ 

3,2.  Q-Horn  Analysis. 

Definition  3.4.  For  x  =  [InnJ  >  4,  call  a  set  of  x  clauses  a  c-cycle  if  all  but 
two  literals  can  be  removed  from  each  of  a;  —  2  clauses,  all  but  three  literals  can  be 
removed  from  two  clauses,  the  variables  can  be  relabeled,  and  the  clauses  can  be 
reordered  in  the  following  sequence 

{vi,  V2}{t’2,  V3}  .  .  .  {Vi,  ti.  +  l,  .  .  .  {vj,  Vj  +  I,  ra:+i}  .  .  .  {Vj;,  Ui}, 

where  Vj  /  vj  if  2  ^  j.  Given  a  c- cycle  C  C  none  of  the  literals  removed  from 
C  are  the  same  or  complementary,  then  C  is  called  a  q-blocked  c-cycle. 

Lemma  3.5.  If  a  formula  T  has  a  q-blocked  c-cycle  then  it  is  not  q-Horn, 

Proof:  Let  a  q-blocked  c-cycle  in  T  be  represented  as  follows 
{vi,  1)2,  .  {vi,  tii  +  i,  {«J,  Vj  +  1,  V^+1,  {Uj,  Si, . .  .}• 

Develop  inequalities  (1)  and  (2)  for  the  formulas  above.  We  get,  after  rearranging 
terms  in  each 

ai  <  Z  —  1  -h  02  —  . . . 


o,-  <  Z  -  1  -f  Oi+i  -  Ox+i  -  . . . 


(3) 
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a j  <  ^  -  1  +  ctj+1  -  (1  -  aa;+i)  “  . . . 

aa:  <  Z  -  1+  ai  - -  (4) 

From  inequalities  (3)  to  (4)  we  deduce 

ai  <  xZ  —  X  ax  —  —  Oix  ocx)  —  .  • . 

or 

0<xZ  —  X  —  I  — 

where  all  the  terms  in  . . .  are  non-negative.  Thus,  all  solutions  to  (3)  through  (4) 
require  Z  >  (x  -{■I)!  x  ^  \  -\-  \/ x  '=  1+  l/[ln^nj  >  1  -h  1/n^  for  any  fixed  ^  <  1. 
This  violates  the  result  of  [4]  that  requires  Z  <  1  in  order  for  T  to  be  q-Horn.  □ 

Theorem  3.6.  Under  the  probability  that  a  random  formula  T  is  q- 

Horn  tends  to  0  if  r  >  4c/(^^  -  Ar),  c>  1  a  constant. 

Proof:  This  is  another  application  of  the  second  moment  method  closely  fol¬ 
lowing  that  of  Theorem  3.3.  This  time  we  seek  the  expected  number  of  q-blocked 
c-cycles  in  T  and  to  show  that  this  expectation  is  large  and  variance  small  over  the 
indicated  range  of  r.  Then,  by  Lemma  3.5,  the  result  follows. 

Taking  advantage  of  the  remarkable  similarities  between  q-blocked  c-cycles  and 
blocked  equivalence  cycles,  we  need  modify  the  proof  of  Theorem  3.3  only  by  the 
small  changes  due  to  a  slightly  different  probability  of  the  event  being  measured 
and  count  of  the  number  of  possibilities.  Thus, 

rrfn^~^  is  replaced  by 

and 

(  ^  (n  —  is  replaced  by 

\  / 

l)y~^  l)^fc-2)^^  (n  -  a;  -  2)('=-2)(^-2)+2(*-3)^ 

(k  —  l\ 

I  —  k  (^)  J  is  not  used.  The  details  are  omitted.  □ 

3.3.  Cycles.  As  stated  before,  a  cycle  in  a  formula  is  an  undirected  cycle  in 
the  graph  formed  by  considering  each  clause  as  a  node  and  connecting  each  pair 
of  clauses  that  share  a  variable.  A  formula  without  a  cycle  is  trivially  satisfied  by 
assigning  values  to  variables  satisfying  the  “leaf”  clauses,  working  inward  to  the 
root(s).  A  formula  with  no  cycles  is  a  member  of  all  the  well  known  polynomially 
solvable  subclasses  except  for  Horn  (however,  it  is  renamable  Horn).  The  results 
above  for  SLUR  and  q-Horn  show  that  cycles  in  a  random  formula  are  abundant  if 
r  >  4/{k{k  —  1)).  The  next  theorem  shows  that  random  formulas  have  few  cycles 
if  r  <  .618/(A;(Ar  —  1)). 

Theorem  3.7.  Under  k  the  average  number  of  cycles  in  a  random 

formula  T  is  less  than  1  when  r  =  ^  <  . 
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Proof:  We  find  the  expected  number  of  cycle  patterns  in  T  which  is  an  over¬ 
estimate  of  the  expected  number  of  cycles  in  T,  The  number  of  cycle  patterns 
involving  a  sequence  of  x  clauses  is  rrfrf.  The  probability  that  a  sequence  of  x 
clauses  matches  a  pattern  of  x  clauses  is  Therefore,  the  expected 

number  of  cycle  patterns  in  T  is 


This  is  less  than  1  if  7nk{k  -*  l)/n  —  1  <  .618.  □ 

This  result  shows  how  closely  SLUR,  q-Horn,  and  other  subclasses  are  tied  to 
cycles  in  formulas:  it  seems  that  they  are  defeated  rapidly  by  the  presence  of  cycles 
(that  is,  as  r  rises,  formulas  are  not  SLUR,  q-Horn,  etc.  soon  after  they  begin  to 
contain  a  significant  number  of  cycles). 

3.4.  Easy  Unsatisfiable  Families  of  Formulas.  Since  it  is  based  on  unit 
resolution,  one  of  the  drawbacks  of  SLUR  is  it  fails  to  provide  a  proof  of  unsat¬ 
isfiability  for  all  but  some  trivial  unsatisfiable  formulas.  On  the  other  hand,  it  is 
not  hard  to  find  non-trivial  unsatisfiable  formulas  that  are  q-Horn.  The  question, 
whether  q-Horn  contains  relatively  many  unsatisfiable  formulas,  seems  to  have  the 
answer  no  from  the  results  above  since  q-Horn  formulas  do  not  appear  in  abundance 
if  r  >  4/ (k{k  —  1))  but  formulas  are  satisfiable  with  high  probability  if  r  <  .25  2^/k. 
Indeed,  the  results  above  show  that  both  SLUR  and  q-Horn  are  equally  handicapped 
in  solving  unsatisfiable  formulas.  On  top  of  this,  it  can  be  shown  that  there  are 
families  of  unsatisfiable  formulae  that  are  easy  to  solve  but  are  not  in  either  SLUR 
or  q-Horn.  For  example,  the  following  result  is  proved  in  a  forthcoming  paper  by 
Franco  and  Van  Gelder. 

Theorem  3.8.  // lim^n^n^co  =  oo,  a  random  formula  T  is  un¬ 

satisfiable  and  can  be  solved  in  polynomial  time  with  probability  1  —  o(l). 

4.  Discussion  and  Conclusions 

The  aim  of  this  paper  is  to  determine  tl  e  relative  sizes  of  some  well  known 
polynomially  solvable  subclasses  of  Satisfiability.  We  used  and  the  ratio 

r  —  mjn  to  provide  a  scale  of  formula  ‘‘hardness”  and  determined  where,  on  that 
scale,  random  formulas  are  members  of  the  subclasses  with  high  probability.  We 
found  that  random  formulas  are  not  SLUR  or  q-Horn  about  where  formula  cycles 
begin  to  appear.  Thus,  neither  subclass  dominates  in  any  range  of  r  except  where 
formulas  are  extremely  “easy.”  The  weakness  of  all  the  subclasses  is  that  some  local 
property  can  defeat  them.  In  the  case  of  SLUR  and  q-Horn  this  local  property  is 
the  presence  of  cycles  and  we  showed  that  both  SLUR  and  q-Horn  are  about  equally 
handicapped  by  this:  that  is,  they  are  defeated  by  cycles  that  are  similar  in  nature. 
This  is  surprising  since,  except  for  trivial  cases,  SLUR  is  useless  on  unsatisfiable 
formulas  and  q-Horn  can  solve  non-trivial  unsatisfiable  formulas.  Because  of  this, 
we  had  expected  that  q-Horn  would  dominate  in  some  range  of  r  where  formulas 
are  unsatisifable  with  high  probability.  But,  this  turned  out  not  to  be  the  case  even 
though  there  is  a  range  of  r  where  unsatisfiable  formulas  are  “easy.” 
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We  have  also  observed  something  unexpected  about  the  Satisfiability  index  of 
(l)-(2)  and  [4].  According  to  this  index,  the  subclass  of  formulas  that  satisfy  a  set  of 
constraints  with  parameter  Z  =  1  +  c^^  is  poly nomially  solvable  and  the  subclass 
of  formulas  that  satisfy  these  constraints  with  Z  =  1  -h  for  any  1  >  /3  >  0 
is  NP-complete.  However,  from  Lemma  3.5  and  Theorem  3.6,  nearly  all  formulas 
satisfy  the  constraints  with  Z  >  1  +  for  any  ^  >  0,  if  r  >  4/(fc(fc  -  1)  and  in 
this  range,  up  to  r  =  .25  2^/Ar,  most  random  formulas  are  very  easy  but  not  usually 
in  one  of  the  well  known  polynomially  solved  subclasses  of  Satisfiability. 
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